Skip to content

Make Ingest Workload Production Ready#7

Merged
DanielBlei merged 8 commits intomainfrom
develop
Sep 14, 2025
Merged

Make Ingest Workload Production Ready#7
DanielBlei merged 8 commits intomainfrom
develop

Conversation

@DanielBlei
Copy link
Owner

@DanielBlei DanielBlei commented Sep 14, 2025

Pull Request

Description

This PR enhance the Pipeline Forge ingest workload, transforming it from a basic prototype into a production-ready data ingestion system. The changes span infrastructure setup, security improvements, testing coverage, and comprehensive documentation updates.

Implementation Notes

  • Security: Integrated Google Cloud Secret Manager for secure credential handling
  • Infrastructure: Added Terraform modules for GCP sandbox environment (BigQuery, IAM, secrets)
  • Testing: Added comprehensive test suite
  • Architecture: Refactored target inheritance and added connection validation + dry-run mode
  • Documentation: Streamlined READMEs and added complete docstring coverage
  • Developer Experience: Added new Makefile targets (install, run-help, test-coverage)
  • Code Quality: Fixed formatting issues and enhanced error handling with retry logic

Testing

  • Code builds and runs locally
  • Relevant tests added or updated
  • Manual/automated tests performed

Checklist

  • Code style is consistent with the project
  • Documentation updated if needed
  • Ready for review

Thank you for your contribution!

- Implement validate_connection() for sources and targets
- Add --dry-run flag to prevent actual data loading
- Enhance factory functions with connection validation
- Improve error handling and logging
- Redesign TargetInterface and Target base class for better separation of concerns
- Move common validation logic (data/table validation) to base Target class
- Update BigQueryTarget to properly extend base class using super().load()
- Standardize method signatures across target implementations
- Remove client abstraction from base class, keep target-specific
- Improve parameter naming consistency (target_table vs table)
Add Terraform configuration for GCP sandbox environment including:
- BigQuery dataset
- IAM roles and service accounts
- Secret management configuration
- Main infrastructure definitions
- Variable definitions for environment configuration
- Add secret handler module for GCP Secret Manager integration
- Refactor config to support secret references instead of plain passwords
- Update source creation to resolve secrets before database connection
- Add google-cloud-secret-manager dependency
- Remove default values from docker-compose environment variables
- Add project_number field to BigQuery target configuration
- Improve error logging with stack traces in main processing loop
…ucture

- Add comprehensive test suite with 12 new test files covering core, components, integration, and error handling
- Implement Google Cloud Secret Manager integration for secure credential handling
- Add Terraform infrastructure for GCP sandbox environment (BigQuery, IAM, secrets)
- Enhance BigQuery target with improved error handling and configuration
- Refactor core components (catalog, config, extractors, sources) for better modularity
- Add new dependencies: pytest-mock, opentelemetry, google-cloud-secret-manager, pytest-cov
- Remove obsolete devcontainer configuration and hello world test
- Update Docker and Makefile configurations for improved development workflow
Add comprehensive docstring coverage across all modules, classes, and methods.
Fix formatting issues including periods, blank lines, and imperative mood.
Ensure compliance with pydocstyle rules.
@DanielBlei DanielBlei changed the title Develop Make Ingest Workload Production Ready Sep 14, 2025
@DanielBlei DanielBlei self-assigned this Sep 14, 2025
@DanielBlei DanielBlei merged commit 207037d into main Sep 14, 2025
4 checks passed
@DanielBlei DanielBlei deleted the develop branch September 14, 2025 10:12
- Streamline main README with clearer navigation and architecture overview
- Add comprehensive workloads README with component descriptions
- Enhance ingest workload documentation with usage examples and configuration
- Add new Makefile targets for install and run-help commands
- Improve project structure visibility and quick start instructions
Fix formatting and type conflicts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant